Features Extraction Algorithm from Sgml for Classification

نویسندگان

  • Zailani Abdullah
  • Muhammad Suzuri Hitam
چکیده

The basic phases in text categorization include preprocessing features, extracting relevant features against the features in a database, and finally categorizing a set of documents into predefined categories. Most of the researches in text categorization are focusing more on the development of algorithms and computer techniques. An algorithm for pre-processing features is seem to be like a "black-box" and ignored by them. Thus, it is significant and worthwhile to develop an algorithm for preprocessing features and finally can be used by other beginners before going in depth in the field of text categorization. This research proposes an algorithm for preprocessing features with capability of Microsoft .NET framework technology. The actual implementation shows that, this algorithm can extract interested features from the standard corpus of collection and upload into a relational database. Keyword: Preprocessing, text categorization, algorithm, .net

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm

Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminat...

متن کامل

Heart Rate Variability Classification using Support Vector Machine and Genetic Algorithm

Background: Electrocardiogram (ECG) is defined as an electrical signal, which represents cardiac activity. Heart rate variability (HRV) as the variation of interval between two consecutive heartbeats represents the balance between the sympathetic and parasympathetic branches of the autonomic nervous system.Objective: In this study, we aimed to evaluate the efficiency of discrete wavelet transfo...

متن کامل

Classification of ECG signals using Hermite functions and MLP neural networks

Classification of heart arrhythmia is an important step in developing devices for monitoring the health of individuals. This paper proposes a three module system for classification of electrocardiogram (ECG) beats. These modules are: denoising module, feature extraction module and a classification module. In the first module the stationary wavelet transform (SWF) is used for noise reduction of ...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Object-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images

As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...

متن کامل

Classification of polarimetric radar images based on SVM and BGSA

Classification of land cover is one of the most important applications of radar polarimetry images. The purpose of image classification is to classify image pixels into different classes based on vector properties of the extractor. Radar imaging systems provide useful information about ground cover by using a wide range of electromagnetic waves to image the Earthchr('39')s surface. The purpose ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007